In [1]:

    
%matplotlib inline
import matplotlib.pyplot as plt
import statsmodels.api as sm
import numpy as np

Q1 (10 points)

The heart dataframe contains the survival time after receiving a heart transplant, the age of the patient and whether or not the survival time was censored

Number of Observations - 69
Number of Variables - 3

Variable name definitions::

survival - Days after surgery until death
censors - indicates if an observation is censored. 1 is uncensored
age - age at the time of surgery

Answer the following questions with respect to the heart data set:

Sort the data frame by age in descending order (oldest at top) without making a copy
How many patients were censored?
What is the average age for uncensored patients under the age of 45?
Find the mean and standard deviation of age and survival time for each value of the cenoring variable.
Plot the linear regression of survival (y-axis) against age (x-axis) conditioned on censoring (i.e. either have two separate plots or a single plot using color to distinguish censored and uncensored patients).



In [2]:

    
heart = sm.datasets.heart.load_pandas().data
heart.head(n=6)



In [3]:

    
heart.sort_values('age', ascending=False, inplace=True)
heart.head()



In [4]:

    
sum(heart.censors == 0)









    Out[4]:





24



In [5]:

    
heart.loc[(heart.censors == 1) & (heart.age < 45), 'age'].mean()









    Out[5]:





38.418181818181814



In [6]:

    
heart.groupby(['censors']).agg(['mean', 'std'])



In [7]:

    
import seaborn as sns

sns.lmplot(data=heart, x='age', y='survival', hue='censors')
pass

Q2 (10 points)

Write a flatmap function that works like map except that the function given takes a list and returns a list of lists that is then flattened (4 points).

In other words, flatmap takes two arguments, a function and a list (or other iterable), just like map. Howevver the function given as the first agument takes a single argument and returns a list (or ohter iterable). In order to get a simple list back, we need to unravel the reuslting list of lists, hence the flatten part.

For example,

flatmap(lambda x: x.split(), ["hello world", "the quick dog"])

should return

["hello", "world", "the", "quick", "dog"]



In [8]:

    
def flatten(list_of_lists):
    """Flatten a list of lists."""
    for alist in list_of_lists:
        for item in alist:
            yield item

def flatmap(f, xs):
    """First map, then flatten result."""
    return flatten(map(f, xs))



In [9]:

    
list(flatmap(lambda x: x.split(), ["hello world", "the quick dog"]))









    Out[9]:





['hello', 'world', 'the', 'quick', 'dog']

Q3 (10 points)

An affine transformation of a vector $x$ is the operation $Ax + b$, where $A$ is a matrix and $b$ is a vector.

Write a function to perform an affine transformation.
Write a function to reverse the affine transformation
Perform an affine transformation of a random 3 by 3 matrix A, and random 3-vectors $x$ and $b$ drawn from the standard uniform distribution with random seed = 1234 and save the result as $y$. Perform the reverse affine transform on $y$ to recover the original vector $x$.



In [10]:

    
def affine(x, A, b):
    """Affine transofrm."""
    return A@x + b



In [11]:

    
def rev_affine(y, A, b):
    """Reverse affine transform."""
    return np.linalg.solve(A, (y - b))



In [12]:

    
A = np.random.random((3,3,))
b = np.random.random(3)
x = np.random.random(3)



In [13]:

    
x









    Out[13]:





array([ 0.91397072,  0.44084231,  0.73953615])



In [14]:

    
y = affine(x, A, b)
y









    Out[14]:





array([ 1.61972625,  1.53725841,  1.14072047])



In [15]:

    
rev_affine(y, A, b)









    Out[15]:





array([ 0.91397072,  0.44084231,  0.73953615])

Q4 (10 points)

You are given the following DNA sequecne in FASTA format.

dna = '''> A simulated DNA sequence.
TTAGGCAGTAACCCCGCGATAGGTAGAGCACGCAATCGTCAAGGCGTGCGGTAGGGCTTCCGTGTCTTACCCAAAGAAAC
GACGTAACGTTCCCCGGGCGGTTAAACCAAATCCACTTCACCAACGGCATAACGCGAAGCCCAAACTAAATCGCGCTCGA
GCGGACGCACATTCGCTAGGCTGTGTAGGGGCAGTCTCCGTTAAGGACGATTACCACGTGATGGTAGTTCGCAACATTGG
ACTGTCGGGAATTCCCGAAGGCACTTAAGCGGAGTCTTAGCGTACAGTAACGCAGTCCCGCGTGAACGACTGACAGATGA
'''

Remove the comment line and combine the 4 lines of nucleotide symbols into a single string
Count the frequecny of all 16 two-letter combinations in the string.



In [16]:

    
dna = '''> A simulated DNA sequence.
TTAGGCAGTAACCCCGCGATAGGTAGAGCACGCAATCGTCAAGGCGTGCGGTAGGGCTTCCGTGTCTTACCCAAAGAAAC
GACGTAACGTTCCCCGGGCGGTTAAACCAAATCCACTTCACCAACGGCATAACGCGAAGCCCAAACTAAATCGCGCTCGA
GCGGACGCACATTCGCTAGGCTGTGTAGGGGCAGTCTCCGTTAAGGACGATTACCACGTGATGGTAGTTCGCAACATTGG
ACTGTCGGGAATTCCCGAAGGCACTTAAGCGGAGTCTTAGCGTACAGTAACGCAGTCCCGCGTGAACGACTGACAGATGA
'''
dna = ''.join(dna.split('\n')[1:])
dna









    Out[16]:





'TTAGGCAGTAACCCCGCGATAGGTAGAGCACGCAATCGTCAAGGCGTGCGGTAGGGCTTCCGTGTCTTACCCAAAGAAACGACGTAACGTTCCCCGGGCGGTTAAACCAAATCCACTTCACCAACGGCATAACGCGAAGCCCAAACTAAATCGCGCTCGAGCGGACGCACATTCGCTAGGCTGTGTAGGGGCAGTCTCCGTTAAGGACGATTACCACGTGATGGTAGTTCGCAACATTGGACTGTCGGGAATTCCCGAAGGCACTTAAGCGGAGTCTTAGCGTACAGTAACGCAGTCCCGCGTGAACGACTGACAGATGA'



In [17]:

    
from collections import Counter

c = Counter(zip(dna[:-1], dna[1:]))
c









    Out[17]:





Counter({('A', 'A'): 26,
         ('A', 'C'): 25,
         ('A', 'G'): 22,
         ('A', 'T'): 11,
         ('C', 'A'): 21,
         ('C', 'C'): 20,
         ('C', 'G'): 33,
         ('C', 'T'): 12,
         ('G', 'A'): 19,
         ('G', 'C'): 24,
         ('G', 'G'): 22,
         ('G', 'T'): 23,
         ('T', 'A'): 19,
         ('T', 'C'): 17,
         ('T', 'G'): 11,
         ('T', 'T'): 14})



In [18]:

    
sum(c.values())









    Out[18]:





319

Q5 (10 points)

The code given below performs a stochastic gradient descent to fit a quadratic polynomila to $n$ data points. Maake the code run faster by:

Using numba JIT
using Cython

Some test code is provided. Please run this for your optimized versios to confirm that they give the same resuls as the orignal code.



In [19]:

    
def sgd(b, x, y, max_iter, alpha):
    n = x.shape[0]
    for i in range(max_iter):
        for j in range(n):
            b[0] -= alpha * (2*(b[0] + b[1]*x[j] + b[2]*x[j]**2 - y[j]))
            b[1] =- alpha * (2*x[j] * (b[0] + b[1]*x[j] + b[2]*x[j]**2 - y[j]))
            b[2] -= alpha * (2*x[j]**2 * (b[0] + b[1]*x[j] + b[2]*x[j]**2 - y[j]))
    return b



In [20]:

    
%%time
np.random.seed(12345)
n = 10000
x = np.linspace(0, 10, n)
y = 2*x**2 + 6*x + 3 + np.random.normal(0, 5, n)
k = 100
alpha = 0.00001

b0 = np.random.random(3) 
b = sgd(b0, x, y, k, alpha)

yhat = b[0] + b[1]*x+ b[2]*x**2
idx = sorted(np.random.choice(n, 100))
plt.scatter(x[idx], y[idx])
plt.plot(x[idx], yhat[idx], c='red')
pass









    



CPU times: user 7.88 s, sys: 8.68 ms, total: 7.89 s
Wall time: 7.89 s



In [21]:

    
from numba import jit

@jit(nopython = True)
def sgd_numba(b, x, y, max_iter, alpha):
    n = x.shape[0]
    for i in range(max_iter):
        for j in range(n):
            b[0] -= alpha * (2*(b[0] + b[1]*x[j] + b[2]*x[j]**2 - y[j]))
            b[1] =- alpha * (2*x[j] * (b[0] + b[1]*x[j] + b[2]*x[j]**2 - y[j]))
            b[2] -= alpha * (2*x[j]**2 * (b[0] + b[1]*x[j] + b[2]*x[j]**2 - y[j]))
    return b



In [22]:

    
# Call once to trigger compilation
sgd_numba(b0, x, y, k, alpha);



In [23]:

    
%%time
np.random.seed(12345)
n = 10000
x = np.linspace(0, 10, n)
y = 2*x**2 + 6*x + 3 + np.random.normal(0, 5, n)
k = 100
alpha = 0.00001

b0 = np.random.random(3) 
b = sgd_numba(b0, x, y, k, alpha)

yhat = b[0] + b[1]*x+ b[2]*x**2
idx = sorted(np.random.choice(n, 100))
plt.scatter(x[idx], y[idx])
plt.plot(x[idx], yhat[idx], c='red')
pass









    



CPU times: user 105 ms, sys: 3.06 ms, total: 108 ms
Wall time: 106 ms



In [24]:

    
%load_ext cython



In [25]:

    
%%cython -a
cimport cython

@cython.wraparound(False)
@cython.boundscheck(False)
def sgd_cython(double[:] b, double[:] x, double[:] y, int max_iter, double alpha):
    cdef int n = x.shape[0]
    cdef int i, j
    
    for i in range(max_iter):
        for j in range(n):
            b[0] -= alpha * (2*(b[0] + b[1]*x[j] + b[2]*x[j]**2 - y[j]))
            b[1] =- alpha * (2*x[j] * (b[0] + b[1]*x[j] + b[2]*x[j]**2 - y[j]))
            b[2] -= alpha * (2*x[j]**2 * (b[0] + b[1]*x[j] + b[2]*x[j]**2 - y[j]))
    return b









    Out[25]:









    
    Cython: _cython_magic_a400bf20ba756fb3f523f534b55b11f1.pyx
    
    


Generated by Cython 0.25.2

    Yellow lines hint at Python interaction.

    Click on a line that starts with a "+" to see the C code that Cython generated for it.

+01: cimport cython
  __pyx_t_1 = PyDict_New(); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 1, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  if (PyDict_SetItem(__pyx_d, __pyx_n_s_test, __pyx_t_1) < 0) __PYX_ERR(0, 1, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
 02: 
 03: @cython.wraparound(False)
 04: @cython.boundscheck(False)
+05: def sgd_cython(double[:] b, double[:] x, double[:] y, int max_iter, double alpha):
/* Python wrapper */
static PyObject *__pyx_pw_46_cython_magic_a400bf20ba756fb3f523f534b55b11f1_1sgd_cython(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds); /*proto*/
static PyMethodDef __pyx_mdef_46_cython_magic_a400bf20ba756fb3f523f534b55b11f1_1sgd_cython = {"sgd_cython", (PyCFunction)__pyx_pw_46_cython_magic_a400bf20ba756fb3f523f534b55b11f1_1sgd_cython, METH_VARARGS|METH_KEYWORDS, 0};
static PyObject *__pyx_pw_46_cython_magic_a400bf20ba756fb3f523f534b55b11f1_1sgd_cython(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds) {
  __Pyx_memviewslice __pyx_v_b = { 0, 0, { 0 }, { 0 }, { 0 } };
  __Pyx_memviewslice __pyx_v_x = { 0, 0, { 0 }, { 0 }, { 0 } };
  __Pyx_memviewslice __pyx_v_y = { 0, 0, { 0 }, { 0 }, { 0 } };
  int __pyx_v_max_iter;
  double __pyx_v_alpha;
  PyObject *__pyx_r = 0;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("sgd_cython (wrapper)", 0);
  {
    static PyObject **__pyx_pyargnames[] = {&__pyx_n_s_b,&__pyx_n_s_x,&__pyx_n_s_y,&__pyx_n_s_max_iter,&__pyx_n_s_alpha,0};
    PyObject* values[5] = {0,0,0,0,0};
    if (unlikely(__pyx_kwds)) {
      Py_ssize_t kw_args;
      const Py_ssize_t pos_args = PyTuple_GET_SIZE(__pyx_args);
      switch (pos_args) {
        case  5: values[4] = PyTuple_GET_ITEM(__pyx_args, 4);
        case  4: values[3] = PyTuple_GET_ITEM(__pyx_args, 3);
        case  3: values[2] = PyTuple_GET_ITEM(__pyx_args, 2);
        case  2: values[1] = PyTuple_GET_ITEM(__pyx_args, 1);
        case  1: values[0] = PyTuple_GET_ITEM(__pyx_args, 0);
        case  0: break;
        default: goto __pyx_L5_argtuple_error;
      }
      kw_args = PyDict_Size(__pyx_kwds);
      switch (pos_args) {
        case  0:
        if (likely((values[0] = PyDict_GetItem(__pyx_kwds, __pyx_n_s_b)) != 0)) kw_args--;
        else goto __pyx_L5_argtuple_error;
        case  1:
        if (likely((values[1] = PyDict_GetItem(__pyx_kwds, __pyx_n_s_x)) != 0)) kw_args--;
        else {
          __Pyx_RaiseArgtupleInvalid("sgd_cython", 1, 5, 5, 1); __PYX_ERR(0, 5, __pyx_L3_error)
        }
        case  2:
        if (likely((values[2] = PyDict_GetItem(__pyx_kwds, __pyx_n_s_y)) != 0)) kw_args--;
        else {
          __Pyx_RaiseArgtupleInvalid("sgd_cython", 1, 5, 5, 2); __PYX_ERR(0, 5, __pyx_L3_error)
        }
        case  3:
        if (likely((values[3] = PyDict_GetItem(__pyx_kwds, __pyx_n_s_max_iter)) != 0)) kw_args--;
        else {
          __Pyx_RaiseArgtupleInvalid("sgd_cython", 1, 5, 5, 3); __PYX_ERR(0, 5, __pyx_L3_error)
        }
        case  4:
        if (likely((values[4] = PyDict_GetItem(__pyx_kwds, __pyx_n_s_alpha)) != 0)) kw_args--;
        else {
          __Pyx_RaiseArgtupleInvalid("sgd_cython", 1, 5, 5, 4); __PYX_ERR(0, 5, __pyx_L3_error)
        }
      }
      if (unlikely(kw_args > 0)) {
        if (unlikely(__Pyx_ParseOptionalKeywords(__pyx_kwds, __pyx_pyargnames, 0, values, pos_args, "sgd_cython") < 0)) __PYX_ERR(0, 5, __pyx_L3_error)
      }
    } else if (PyTuple_GET_SIZE(__pyx_args) != 5) {
      goto __pyx_L5_argtuple_error;
    } else {
      values[0] = PyTuple_GET_ITEM(__pyx_args, 0);
      values[1] = PyTuple_GET_ITEM(__pyx_args, 1);
      values[2] = PyTuple_GET_ITEM(__pyx_args, 2);
      values[3] = PyTuple_GET_ITEM(__pyx_args, 3);
      values[4] = PyTuple_GET_ITEM(__pyx_args, 4);
    }
    __pyx_v_b = __Pyx_PyObject_to_MemoryviewSlice_ds_double(values[0]); if (unlikely(!__pyx_v_b.memview)) __PYX_ERR(0, 5, __pyx_L3_error)
    __pyx_v_x = __Pyx_PyObject_to_MemoryviewSlice_ds_double(values[1]); if (unlikely(!__pyx_v_x.memview)) __PYX_ERR(0, 5, __pyx_L3_error)
    __pyx_v_y = __Pyx_PyObject_to_MemoryviewSlice_ds_double(values[2]); if (unlikely(!__pyx_v_y.memview)) __PYX_ERR(0, 5, __pyx_L3_error)
    __pyx_v_max_iter = __Pyx_PyInt_As_int(values[3]); if (unlikely((__pyx_v_max_iter == (int)-1) && PyErr_Occurred())) __PYX_ERR(0, 5, __pyx_L3_error)
    __pyx_v_alpha = __pyx_PyFloat_AsDouble(values[4]); if (unlikely((__pyx_v_alpha == (double)-1) && PyErr_Occurred())) __PYX_ERR(0, 5, __pyx_L3_error)
  }
  goto __pyx_L4_argument_unpacking_done;
  __pyx_L5_argtuple_error:;
  __Pyx_RaiseArgtupleInvalid("sgd_cython", 1, 5, 5, PyTuple_GET_SIZE(__pyx_args)); __PYX_ERR(0, 5, __pyx_L3_error)
  __pyx_L3_error:;
  __Pyx_AddTraceback("_cython_magic_a400bf20ba756fb3f523f534b55b11f1.sgd_cython", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __Pyx_RefNannyFinishContext();
  return NULL;
  __pyx_L4_argument_unpacking_done:;
  __pyx_r = __pyx_pf_46_cython_magic_a400bf20ba756fb3f523f534b55b11f1_sgd_cython(__pyx_self, __pyx_v_b, __pyx_v_x, __pyx_v_y, __pyx_v_max_iter, __pyx_v_alpha);

  /* function exit code */
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static PyObject *__pyx_pf_46_cython_magic_a400bf20ba756fb3f523f534b55b11f1_sgd_cython(CYTHON_UNUSED PyObject *__pyx_self, __Pyx_memviewslice __pyx_v_b, __Pyx_memviewslice __pyx_v_x, __Pyx_memviewslice __pyx_v_y, int __pyx_v_max_iter, double __pyx_v_alpha) {
  int __pyx_v_n;
  CYTHON_UNUSED int __pyx_v_i;
  int __pyx_v_j;
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("sgd_cython", 0);
/* … */
  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_28);
  __Pyx_AddTraceback("_cython_magic_a400bf20ba756fb3f523f534b55b11f1.sgd_cython", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = NULL;
  __pyx_L0:;
  __PYX_XDEC_MEMVIEW(&__pyx_v_b, 1);
  __PYX_XDEC_MEMVIEW(&__pyx_v_x, 1);
  __PYX_XDEC_MEMVIEW(&__pyx_v_y, 1);
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}
/* … */
  __pyx_tuple__14 = PyTuple_Pack(8, __pyx_n_s_b, __pyx_n_s_x, __pyx_n_s_y, __pyx_n_s_max_iter, __pyx_n_s_alpha, __pyx_n_s_n, __pyx_n_s_i, __pyx_n_s_j); if (unlikely(!__pyx_tuple__14)) __PYX_ERR(0, 5, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__14);
  __Pyx_GIVEREF(__pyx_tuple__14);
/* … */
  __pyx_t_1 = PyCFunction_NewEx(&__pyx_mdef_46_cython_magic_a400bf20ba756fb3f523f534b55b11f1_1sgd_cython, NULL, __pyx_n_s_cython_magic_a400bf20ba756fb3f5); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 5, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  if (PyDict_SetItem(__pyx_d, __pyx_n_s_sgd_cython, __pyx_t_1) < 0) __PYX_ERR(0, 5, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  __pyx_codeobj__15 = (PyObject*)__Pyx_PyCode_New(5, 0, 8, 0, 0, __pyx_empty_bytes, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_tuple__14, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_kp_s_Users_cliburn_ipython_cython__c, __pyx_n_s_sgd_cython, 5, __pyx_empty_bytes); if (unlikely(!__pyx_codeobj__15)) __PYX_ERR(0, 5, __pyx_L1_error)
+06:     cdef int n = x.shape[0]
  __pyx_v_n = (__pyx_v_x.shape[0]);
 07:     cdef int i, j
 08: 
+09:     for i in range(max_iter):
  __pyx_t_1 = __pyx_v_max_iter;
  for (__pyx_t_2 = 0; __pyx_t_2 < __pyx_t_1; __pyx_t_2+=1) {
    __pyx_v_i = __pyx_t_2;
+10:         for j in range(n):
    __pyx_t_3 = __pyx_v_n;
    for (__pyx_t_4 = 0; __pyx_t_4 < __pyx_t_3; __pyx_t_4+=1) {
      __pyx_v_j = __pyx_t_4;
+11:             b[0] -= alpha * (2*(b[0] + b[1]*x[j] + b[2]*x[j]**2 - y[j]))
      __pyx_t_5 = 0;
      __pyx_t_6 = 1;
      __pyx_t_7 = __pyx_v_j;
      __pyx_t_8 = 2;
      __pyx_t_9 = __pyx_v_j;
      __pyx_t_10 = __pyx_v_j;
      __pyx_t_11 = 0;
      *((double *) ( /* dim=0 */ (__pyx_v_b.data + __pyx_t_11 * __pyx_v_b.strides[0]) )) -= (__pyx_v_alpha * (2.0 * ((((*((double *) ( /* dim=0 */ (__pyx_v_b.data + __pyx_t_5 * __pyx_v_b.strides[0]) ))) + ((*((double *) ( /* dim=0 */ (__pyx_v_b.data + __pyx_t_6 * __pyx_v_b.strides[0]) ))) * (*((double *) ( /* dim=0 */ (__pyx_v_x.data + __pyx_t_7 * __pyx_v_x.strides[0]) ))))) + ((*((double *) ( /* dim=0 */ (__pyx_v_b.data + __pyx_t_8 * __pyx_v_b.strides[0]) ))) * pow((*((double *) ( /* dim=0 */ (__pyx_v_x.data + __pyx_t_9 * __pyx_v_x.strides[0]) ))), 2.0))) - (*((double *) ( /* dim=0 */ (__pyx_v_y.data + __pyx_t_10 * __pyx_v_y.strides[0]) ))))));
+12:             b[1] =- alpha * (2*x[j] * (b[0] + b[1]*x[j] + b[2]*x[j]**2 - y[j]))
      __pyx_t_12 = __pyx_v_j;
      __pyx_t_13 = 0;
      __pyx_t_14 = 1;
      __pyx_t_15 = __pyx_v_j;
      __pyx_t_16 = 2;
      __pyx_t_17 = __pyx_v_j;
      __pyx_t_18 = __pyx_v_j;
      __pyx_t_19 = 1;
      *((double *) ( /* dim=0 */ (__pyx_v_b.data + __pyx_t_19 * __pyx_v_b.strides[0]) )) = ((-__pyx_v_alpha) * ((2.0 * (*((double *) ( /* dim=0 */ (__pyx_v_x.data + __pyx_t_12 * __pyx_v_x.strides[0]) )))) * ((((*((double *) ( /* dim=0 */ (__pyx_v_b.data + __pyx_t_13 * __pyx_v_b.strides[0]) ))) + ((*((double *) ( /* dim=0 */ (__pyx_v_b.data + __pyx_t_14 * __pyx_v_b.strides[0]) ))) * (*((double *) ( /* dim=0 */ (__pyx_v_x.data + __pyx_t_15 * __pyx_v_x.strides[0]) ))))) + ((*((double *) ( /* dim=0 */ (__pyx_v_b.data + __pyx_t_16 * __pyx_v_b.strides[0]) ))) * pow((*((double *) ( /* dim=0 */ (__pyx_v_x.data + __pyx_t_17 * __pyx_v_x.strides[0]) ))), 2.0))) - (*((double *) ( /* dim=0 */ (__pyx_v_y.data + __pyx_t_18 * __pyx_v_y.strides[0]) ))))));
+13:             b[2] -= alpha * (2*x[j]**2 * (b[0] + b[1]*x[j] + b[2]*x[j]**2 - y[j]))
      __pyx_t_20 = __pyx_v_j;
      __pyx_t_21 = 0;
      __pyx_t_22 = 1;
      __pyx_t_23 = __pyx_v_j;
      __pyx_t_24 = 2;
      __pyx_t_25 = __pyx_v_j;
      __pyx_t_26 = __pyx_v_j;
      __pyx_t_27 = 2;
      *((double *) ( /* dim=0 */ (__pyx_v_b.data + __pyx_t_27 * __pyx_v_b.strides[0]) )) -= (__pyx_v_alpha * ((2.0 * pow((*((double *) ( /* dim=0 */ (__pyx_v_x.data + __pyx_t_20 * __pyx_v_x.strides[0]) ))), 2.0)) * ((((*((double *) ( /* dim=0 */ (__pyx_v_b.data + __pyx_t_21 * __pyx_v_b.strides[0]) ))) + ((*((double *) ( /* dim=0 */ (__pyx_v_b.data + __pyx_t_22 * __pyx_v_b.strides[0]) ))) * (*((double *) ( /* dim=0 */ (__pyx_v_x.data + __pyx_t_23 * __pyx_v_x.strides[0]) ))))) + ((*((double *) ( /* dim=0 */ (__pyx_v_b.data + __pyx_t_24 * __pyx_v_b.strides[0]) ))) * pow((*((double *) ( /* dim=0 */ (__pyx_v_x.data + __pyx_t_25 * __pyx_v_x.strides[0]) ))), 2.0))) - (*((double *) ( /* dim=0 */ (__pyx_v_y.data + __pyx_t_26 * __pyx_v_y.strides[0]) ))))));
    }
  }
+14:     return b
  __Pyx_XDECREF(__pyx_r);
  __pyx_t_28 = __pyx_memoryview_fromslice(__pyx_v_b, 1, (PyObject *(*)(char *)) __pyx_memview_get_double, (int (*)(char *, PyObject *)) __pyx_memview_set_double, 0);; if (unlikely(!__pyx_t_28)) __PYX_ERR(0, 14, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_28);
  __pyx_r = __pyx_t_28;
  __pyx_t_28 = 0;
  goto __pyx_L0;



In [26]:

    
%%time
np.random.seed(12345)
n = 10000
x = np.linspace(0, 10, n)
y = 2*x**2 + 6*x + 3 + np.random.normal(0, 5, n)
k = 100
alpha = 0.00001

b0 = np.random.random(3) 
b = sgd_cython(b0, x, y, k, alpha)

yhat = b[0] + b[1]*x+ b[2]*x**2
idx = sorted(np.random.choice(n, 100))
plt.scatter(x[idx], y[idx])
plt.plot(x[idx], yhat[idx], c='red')
pass









    



CPU times: user 104 ms, sys: 2.52 ms, total: 106 ms
Wall time: 105 ms



In [ ]:

	survival	censors	age
0	15.0	1.0	54.3
1	3.0	1.0	40.4
2	624.0	1.0	51.0
3	46.0	1.0	42.5
4	127.0	1.0	48.0
5	64.0	1.0	54.6

	survival	censors	age
17	60.0	1.0	64.5
21	47.0	1.0	61.5
12	730.0	1.0	58.4
8	23.0	1.0	56.9
47	63.0	1.0	56.4

	survival		age
	mean	std	mean	std
censors
0.0	664.625000	526.214604	41.729167	9.177026
1.0	223.288889	332.657182	48.484444	7.840163